7 research outputs found

    The Case for Polymorphic Registers in Dataflow Computing

    Get PDF
    Heterogeneous systems are becoming increasingly popular, delivering high performance through hardware specialization. However, sequential data accesses may have a negative impact on performance. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high-speed, parallel access to performance-critical data. This article shows how PRFs can be integrated into dataflow computational platforms. Our semi-automatic, compiler-based methodology generates customized PRFs and modifies the computational kernels to efficiently exploit them. We use a separable 2D convolution case study to evaluate the impact of memory latency and bandwidth on performance compared to a state-of-the-art NVIDIA Tesla C2050 GPU. We improve the throughput up to 56.17X and show that the PRF-augmented system outperforms the GPU for 9×9 or larger mask sizes, even in bandwidth-constrained systems

    Customized Vector Instruction Set Architecture

    No full text
    Abstract — This paper presents a methodology for synthesizing customized vector ISAs for various application domains targeting high performance execution. A number of applications from the telecommunication and linear algebra domains have been studied, and custom vector instructions sets have been synthesized. Three algorithms that compute the shortest paths in a directed graph (Dijkstra, Floyd and Bellman-Ford) have been analyzed, along with the widely used Linpack floating point benchmark. The framework used to customize the ISAs included the use of the Gnu C Compiler versions 4.1.2 and 2.7.2.3 and the SimpleScalar-3.0d tool set extended to simulate customized vector units. The modifications applied to the simulator include the addition of a vector register file, vector functional units and specific vector instructions. The main results can be summarized as follows: overall applications speedups of 24.88X for Dijkstra (after both code optimization and vectorization), 4.99X for Floyd, 9.27X for Bellman-Ford and 4.33X for the C version of Linpack. The above results suggest a consistent improvement in execution times due to the customized vector instruction sets

    Design Considerations for a Domain Specific Vector Microarchitecture

    No full text
    Abstract — In this article, we analyze the speedup potentials of media and signal processing software on vector processors. We evaluate the impact on performance of several design decisions such as the vector register length, memory latency, memory bandwidth and the number of parallel lanes in the datapath. To quantify the influence of the aforementioned design parameters, we modify SimpleScalar 3.0 by adding new vector instructions, a vector register file, and vector functional units and simulate several media and signal processing applications. Simulation results indicate that through vectorization we can obtain kernel speedups ranging from 5.36x to 17.34x and application speedups of 1.82x and 1.37x for the MPEG2 encoder and decoder respectively

    Iron-Oxide-Nanoparticles-Doped Polyaniline Composite Thin Films

    No full text
    Iron-oxide-doped polyaniline (PANI-IO) thin films were obtained by the polymerization of aniline monomers and iron oxide solutions in direct current glow discharge plasma in the absence of a buffer gas for the first time. The PANI-IO thin films were deposited on optical polished Si wafers in order to study surface morphology and evaluate their in vitro biocompatibility. The characterization of the coatings was accomplished using scanning electron microscopy (SEM), Fourier-transform infrared spectroscopy (FTIR), atomic force microscopy (AFM), metallographic microscopy (MM), and X-ray photoelectron spectroscopy (XPS). In vitro biocompatibility assessments were also conducted on the PANI-IO thin films. It was observed that a uniform distribution of iron oxide particles inside the PANI layers was obtained. The constituent elements of the coatings were uniformly distributed. The Fe-O bonds were associated with magnetite in the XPS studies. The surface morphology of the PANI-IO thin films was assessed by atomic force microscopy (AFM). The AFM topographies revealed that PANI-IO exhibited the morphology of a uniformly distributed and continuous layer. The viability of Caco-2 cells cultured on the Si substrate and PANI-IO coating was not significantly modified compared to control cells. Moreover, after 24 h of incubation, we observed no increase in LDH activity in media in comparison to the control. In addition, our results revealed that the NO levels for the Si substrate and PANI-IO coating were similar to those found in the control sample
    corecore